首页> 外文OA文献 >Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
【2h】

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

机译:多臂主动学习的上置信度约束算法   草寇

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we study the problem of estimating uniformly well the meanvalues of several distributions given a finite budget of samples. If thevariance of the distributions were known, one could design an optimal samplingstrategy by collecting a number of independent samples per distribution that isproportional to their variance. However, in the more realistic case where thedistributions are not known in advance, one needs to design adaptive samplingstrategies in order to select which distribution to sample from according tothe previously observed samples. We describe two strategies based on pullingthe distributions a number of times that is proportional to a high-probabilityupper-confidence-bound on their variance (built from previous observed samples)and report a finite-sample performance analysis on the excess estimation errorcompared to the optimal allocation. We show that the performance of theseallocation strategies depends not only on the variances but also on the fullshape of the distributions.
机译:在本文中,我们研究在给定有限样本预算的情况下,均匀好地估计几种分布的均值的问题。如果知道分布的方差,则可以通过为每个分布收集与方差成比例的独立样本,来设计最佳的抽样策略。然而,在更实际的情况下,事先不知道分布,人们需要设计自适应采样策略,以便根据先前观察到的样本从中选择要采样的分布。我们描述了两种基于拉高分布概率的策略,这些概率与高概率概率的方差上限成正比(由先前观察到的样本构建),并针对与最优方法相比的超额估计误差报告了有限样本性能分析分配。我们表明,这些分配策略的性能不仅取决于方差,还取决于分布的完整形状。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号